مشہود اویس
افسوس ہے کہ ۱۸؍ رمضان المبارک کو دارالمصنفین کے پرانے اور مخلص خدمت گذار مشہود اویس صاحب نے داعئی اجل کو لبیک کہا، ان کے والد مولوی محمد اویس صاحب دارالمصنفین کے ابتدائی معماروں میں تھے، جو عرصہ تک اس کے دفتر اور پریس کے انچارج بھی رہے، مشہور اویس بھی آخر دم تک دفتری کاموں کو انجام دیتے رہے، انتقال کے روز بھی دفتر آئے تھے، وہ بڑے خوش مزاج تھے، دوسروں کا کام کر کے خوش ہوتے، ان کی صحت برسوں سے خراب تھی، دمہ کا عارضہ تھا، اﷲ تعالیٰ مغفرت فرمائے، اور متعلقین کو صبر جمیل عطا کرے، آمین۔ (ضیاء الدین اصلاحی، مئی ۱۹۸۹ء)
The action code of Islam is impartial and strong. All human beings, disabled and abled have the right of gaining justice and bound to giving justice. Because of disableness impartial behavior is against the Islamic justice code. The “good behavior” with disabled persons is the basic law of disableness in Islam. The “good behavior” with disabled persons in Islam is not the result of mercy and pity but is the result of that right of taking equal to the life of common people which is due to the nation and community. Hence the rights of disabled persons are the result of those requirements which are created due to disableness. The meaning which we take conventional is “complete worthlessness” but it means in Islam weak and feeble. That is to say disable person is able to work but he has less ability. Or he has no ability to do one job but has the ability of high rank to do other jobs. The Holy Prophet (Peace be upon him) introduced first time this principle of disableness. The Arabs mean the word “disableness” complete uselessness. But the Holy Quran exempted the persons from Jehad not taking part in Tabuk battle because of disableness giving them the name of feebles. Islam does not appeal for mercy with disabled persons but advised to behave well with them and condemns also the injustice of society with them. Islam orders to perform one’s duties to others. Islam not only stresses on the performance of duties but also gives instructions in this connection.
The efficient parallelization of sparse matrix-vector product (SMVP) is of prime importance in scientific computing. To achieve this on a distributed memory computers, we concentrate on minimizing the inter-processor communication, achieving a good balance of workload, overlapping communication with computation along with optimizing single processor performance. The thesis consists of two parts presenting the optimization and improvement of sparse matrix-vector multiplication performance on single as well as multi processors. For the performance improvement of SMVP on a single scalar processor, we propose two sparse storage formats, namely the grouped compressed row storage with permutation (GCRSP) and the blocked compressed row storage with permutation (BCRSP). The proposed formats are designed to efficiently exploit the benefits of blocking such as reduced indirect addressing, increased spatial and temporal locality along with eliminating the corresponding overheads. For the good load balancing and low communication cost, reordering of sparse matrices according to their sparsity structure is highly important. For this purpose we proposed reordering based partitioning strategies that tend to exploit sparsity of input matrix presenting the balanced load distribution along with the reduced communication cost. It has been observed that GCRSP improves the performance over simple compressed row storage (CRS) and compressed row storage with permutation (CRSP) with an average of 16% and 25%, respectively. Moreover, due to blocking in BCRSP, the performance improvements of an average of 32%, 41% and 20% are observed over CRS, CRSP and GCRSP respectively. Likewise, the proposed partitioning models permuted row column matrix produce an average of 49% better load balancing and 14% better communication than the corresponding naïve row/column and checker board models. Moreover, they produce same level of balanced load and an average of 78% better communication than the corresponding balanced naïve partitioning i.e. row/column block and balanced checker board (BCH) models. On the whole an average of 30% performance gain for parallel SMVP is achieved by using BCRSP format along with permuted row partitioning over the implementation using CRS format with naïve row partitioning using cluster of eight processors.