Massive Technical Interviews Tips: Java IO Utils

Saturday, October 24, 2015

Java IO Utils

https://stackoverflow.com/questions/35789253/how-to-read-from-gzipinputstream
We use GZIPInputStream to read gzip files, use ZipInputStream to read zip files.

ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);

String readed;
while ((readed = in.readLine()) != null) {
  System.out.println(readed);
}

http://www.adam-bien.com/roller/abien/entry/reading_inputstream_into_string_with
READING INPUTSTREAM INTO STRING WITH JAVA 8

    public static String read(InputStream input) throws IOException {
        try (BufferedReader buffer = new BufferedReader(new InputStreamReader(input))) {
            return buffer.lines().collect(Collectors.joining("\n"));
        }
    }

http://www.thecoderscorner.com/team-blog/java-and-jvm/12-reading-a-zip-file-from-java-using-zipinputstream/
first you must create the ZipInputStream instance giving the file that you wish to expand. Then you iterate using the getNextEntry method on the stream, which returns the header data for each entry in turn. Importantly this does not contain the data, which is actually read from the stream separately.

http://www.codejava.net/java-se/file-io/programmatically-extract-a-zip-file-using-java
The java.util.zip package provides the following classes for extracting files and directories from a ZIP archive:

- ZipInputStream: this is the main class which can be used for reading zip file and extracting files and directories (entries) within the archive. Here are some important usages of this class:
  - read a zip via its constructor ZipInputStream(FileInputStream)
  - read entries of files and directories via method getNextEntry()
  - read binary data of current entry via method read(byte)
  - close current entry via method closeEntry()
  - close the zip file via method close()
- ZipEntry: this class represents an entry in the zip file. Each file or directory is represented as a ZipEntry object. Its methodgetName() returns a String which represents path of the file/directory. The path is in the following form:
  folder_1/subfolder_1/subfolder_2/…/subfolder_n/file.ext

    public void unzip(String zipFilePath, String destDirectory) throws IOException {

        File destDir = new File(destDirectory);

        if (!destDir.exists()) {

            destDir.mkdir();

}

        ZipInputStream zipIn = new ZipInputStream(new FileInputStream(zipFilePath));

        ZipEntry entry = zipIn.getNextEntry();

        // iterates over entries in the zip file

        while (entry != null) {

            String filePath = destDirectory + File.separator + entry.getName();

            if (!entry.isDirectory()) {

                // if the entry is a file, extracts it

                extractFile(zipIn, filePath);

            } else {

                // if the entry is a directory, make the directory

                File dir = new File(filePath);

                dir.mkdir();

}

            zipIn.closeEntry();

            entry = zipIn.getNextEntry();

}

        zipIn.close();

}

/**

     * Extracts a zip entry (file entry)

     * @param zipIn

     * @param filePath

     * @throws IOException

*/

    private void extractFile(ZipInputStream zipIn, String filePath) throws IOException {

        BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(filePath));

        byte[] bytesIn = new byte[BUFFER_SIZE];

        int read = 0;

        while ((read = zipIn.read(bytesIn)) != -1) {

            bos.write(bytesIn, 0, read);

}

        bos.close();

}

http://blog.csdn.net/tiwerbao/article/details/35388003

我们一般比较两个文件对象是否同一个文件，一般会使用java.io.File.equal()。这里所说的equal()并不是比较文件内容是否一样，而是看两个文件对象是否指向同一个文件。

File的equal()方法，实际上调用了当前文件系统FileSystem的compareTo()。

public boolean equals(Object obj) {  
    if ((obj != null) && (obj instanceof File)) {  
        return compareTo((File)obj) == 0;  
    }  
    return false;  
}  
static private FileSystem fs = FileSystem.getFileSystem();  
public int compareTo(File pathname) {  
    return fs.compare(this, pathname);  
}  

java.io.FileSystem中没有对Unix/Linux的实现，只有Win32FileSystem，所以都是默认调用的这个实现类。它对文件的比较，其实就是对文件名和绝对路径的比较。如果两个File对象有相同的getPath()，就认为他们是同一个文件。而且能看出来，Windows是不区分大小写的。

如下面的java.io.Win32FileSystem.compare()。

public int compare(File f1, File f2) {
return f1.getPath().compareToIgnoreCase(f2.getPath());
}

这样通过比较绝对路径来检验两个对象是否指向同一个文件的方法，能适用大部分的情况，但也要小心。比如说，在Linux下面，文件名对大小写是敏感的，就不能ignore了。而且通过硬链接建立的文件，实质还是指向同一个文件的，但是在File.equal()中却为false。

java.io.UnixFileSystem.compare(File, File)

public int compare(File f1, File f2) {

return f1.getPath().compareTo(f2.getPath());

}

所以在JDK1.7后引入了工具类java.nio.file.Files，可以通过isSameFile()来判断两个文件对象是否指向同一个文件。

public boolean isSameFile(Path path, Path path2) throws IOException {
return provider(path).isSameFile(path, path2);
}
private static FileSystemProvider provider(Path path) {
return path.getFileSystem().provider();
}

public boolean isSameFile(Path obj1, Path obj2) throws IOException {
UnixPath file1 = UnixPath.toUnixPath(obj1);
if (file1.equals(obj2))
return true;
file1.checkRead();file2.checkRead();
UnixFileAttributes attrs1 = UnixFileAttributes.get(file1, true);
UnixFileAttributes attrs2 = UnixFileAttributes.get(file2, true);
return attrs1.isSameFile(attrs2);
}

他先调用了UnixPath.equal()，然后检查两个文件的可读性，最后再调用了UnixFileAttributes.isSameFile()。很显然，他会先检查两个文件的绝对路径是否相同(大小写敏感)，如果相同的话，就认为两者是同一个文件；如果不同，再检查两个文件的iNode号。这是Unix文件系统的特点，文件是通过iNode来标识的，只要iNode号相同，就说明指向同一个文件。所以能用在判断两个硬链接是否指向同一个文件。
UnixPath

public boolean equals(Object ob) {
if ((ob != null) && (ob instanceof UnixPath))
return compareTo((Path)ob) == 0; // compare two path
return false;
}
public int compareTo(Path other) {
int len1 = path.length;
int len2 = ((UnixPath) other).path.length;
int n = Math.min(len1, len2);
byte v1[] = path;
byte v2[] = ((UnixPath) other).path;
int k = 0;
while (k < n) {
int c1 = v1[k] & 0xff;
int c2 = v2[k] & 0xff;
if (c1 != c2)
return c1 - c2;
}
return len1 - len2;
}

UnixFileAttributes

boolean isSameFile(UnixFileAttributes attrs) {
return ((st_ino == attrs.st_ino) && (st_dev == attrs.st_dev));
}

而对于Windows系统，也是大同小异，来看看WindowsFileSystemProvider.isSameFile()，WindowsPath.equal()和 WindowsFileAttributes.isSameFile()。

都是先判断文件绝对路径(忽略大小写)，如果相等就认为是同一个文件；如果不等就再进行底层判断，Windows底层文件的判断是检查磁盘号是否相等来完成的。

WindowsFileAttributes

static boolean isSameFile(WindowsFileAttributes attrs1, WindowsFileAttributes attrs2) {
// volume serial number and file index must be the same
return (attrs1.volSerialNumber == attrs2.volSerialNumber) &&
(attrs1.fileIndexHigh == attrs2.fileIndexHigh) &&
(attrs1.fileIndexLow == attrs2.fileIndexLow);
}

如果只是对比文件的绝对路径是否相等（不是内容），可以放心使用File.equal()。而如果要比较在OS中是否指向同一个文件，可以使用Files.isSameFile()，它考虑到了不同文件系统的差异。同时，我们通过观察这两种系统校验规则的不同实现，也能窥视到不同OS文件系统的差异
http://www.geeksforgeeks.org/fast-io-in-java-in-competitive-programming/
Scanner Class – (easy, less typing, but not recommended very slow, refer this for reasons of slowness): In most of the cases we get TLE while using scanner class. It uses built-in nextInt(), nextLong(), nextDouble methods to read the desired object after initiating scanner object with input stream.(eg System.in).

java.util.Scanner class is a simple text scanner which can parse primitive types and strings. It internally uses regular expressions to read different types.

Java.io.BufferedReader class reads text from a character-input stream, buffering characters so as to provide for the efficient reading of sequence of characters

http://www.geeksforgeeks.org/difference-between-scanner-and-bufferreader-class-in-java/

In Scanner class if we call nextLine() method after any one of the seven nextXXX() method then the nextLine() doesn’t not read values from console and cursor will not come into console it will skip that step. The nextXXX() methods are nextInt(), nextFloat(), nextByte(), nextShort(), nextDouble(), nextLong(), next().

In BufferReader class there is no such type of problem. This problem occurs only for Scanner class, due to nextXXX() methods ignore newline character and nextLine() only reads newline character. If we use one more call of nextLine() method between nextXXX() and nextLine(), then this problem will not occur because nextLine() will consume the newline character. See this for the corrected program. This problem is same as scanf() followed by gets() in C/C++.

BufferedReader is synchronous while Scanner is not. BufferedReader should be used if we are working with multiple threads.
BufferedReader has significantly larger buffer memory than Scanner.
The Scanner has a little buffer (1KB char buffer) as opposed to the BufferedReader (8KB byte buffer), but it’s more than enough.
BufferedReader is a bit faster as compared to scanner because scanner does parsing of input data and BufferedReader simply reads sequence of characters.

Saturday, October 24, 2015

Java IO Utils

Labels

Popular Posts